Optimal exact string matching based on su x arrays

نویسندگان

  • Mohamed Ibrahim
  • Stefan Kurtz
چکیده

Using the su x tree of a string S, decision queries of the type \Is P a substring of S?" can be answered in O(jP j) time and enumeration queries of the type \Where are all z occurrences of P in S?" can be answered inO(jP j+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the su x tree are a severe drawback. The su x array is a more space economical index structure. Using it and an additional table, Manber and Myers (1993) showed that decision queries and enumeration queries can be answered in O(jP j+log jSj) and O(jP j+log jSj+z) time, respectively, but no optimal time algorithms are known. In this paper, we show how to achieve the optimal O(jP j) and O(jP j+ z) time bounds for the su x array. Our approach is not con ned to exact pattern matching. In fact, it can be used to e ciently solve all problems that are usually solved by a top-down traversal of the su x tree. Experiments show that our method is not only of theoretical interest but also of practical relevance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Su x Tree Construction with Large

The su x tree of a string is the fundamental data structure of combinatorial pattern matching. In this paper, we present a novel, deterministic algorithm for the construction of su x trees. We settle the main open problem in the construction of su x trees: we build su x trees in linear time for integer alphabet.

متن کامل

Average-optimal string matching

The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads...

متن کامل

Fast Approximate String Matching with Suffix Arrays and A* Parsing

We present a novel exact solution to the approximate string matching problem in the context of translation memories, where a text segment has to be matched against a large corpus, while allowing for errors. We use suffix arrays to detect exact n-gram matches, A* search heuristics to discard matches and A* parsing to validate candidate segments. The method outperforms the canonical baseline by a...

متن کامل

Optimal Exact Strring Matching Based on Suffix Arrays

Using the suffix tree of a string S, decision queries of the type “Is P a substring of S?” can be answered in O(|P |) time and enumeration queries of the type “Where are all z occurrences of P in S?” can be answered in O(|P |+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the suffix tree are a severe drawback. Th...

متن کامل

String Range Matching

Given strings X and Y the exact string matching problem is to find the occurrences of Y as a substring of X. An alternative formulation asks for the lexicographically consecutive set of suffixes of X that begin with Y. We introduce a generalization called string range matching where we want to find the suffixes of X that are in an arbitrary lexicographical range bounded by two strings Y and Z. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002